google scholar
Remembering Unequally: Global and Disciplinary Bias in LLM-Generated Co-Authorship Networks
Kalhor, Ghazal, Mashhadi, Afra
Ongoing breakthroughs in Large Language Models (LLMs) are reshaping search and recommendation platforms at their core. While this shift unlocks powerful new scientometric tools, it also exposes critical fairness and bias issues that could erode the integrity of the information ecosystem. Additionally, as LLMs become more integrated into web-based searches for scholarly tools, their ability to generate summarized research work based on memorized data introduces new dimensions to these challenges. The extent of memorization in LLMs can impact the accuracy and fairness of the co-authorship networks they produce, potentially reflecting and amplifying existing biases within the scientific community and across different regions. This study critically examines the impact of LLM memorization on the co-authorship networks. To this end, we assess memorization effects across three prominent models, DeepSeek R1, Llama 4 Scout, and Mixtral 8x7B, analyzing how memorization-driven outputs vary across academic disciplines and world regions. While our global analysis reveals a consistent bias favoring highly cited researchers, this pattern is not uniformly observed. Certain disciplines, such as Clinical Medicine, and regions, including parts of Africa, show more balanced representation, pointing to areas where LLM training data may reflect greater equity. These findings underscore both the risks and opportunities in deploying LLMs for scholarly discovery.
- North America > United States > Washington > King County > Bothell (0.14)
- Africa > North Africa (0.05)
- Africa > Sub-Saharan Africa (0.05)
- (9 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
Fake or Real: The Impostor Hunt in Texts for Space Operations
Kaczmarek, Agata, Płudowski, Dawid, Wilczyński, Piotr, Kotowski, Krzysztof, Shendy, Ramez, Ntagiou, Evridiki, Nalepa, Jakub, Janicki, Artur, Biecek, Przemysław
The "Fake or Real" competition hosted on Kaggle (https://www.kaggle.com/competitions/fake-or-real-the-impostor-hunt ) is the second part of a series of follow-up competitions and hackathons related to the "Assurance for Space Domain AI Applications" project funded by the European Space Agency (https://assurance-ai.space-codev.org/ ). The competition idea is based on two real-life AI security threats identified within the project -- data poisoning and overreliance in Large Language Models. The task is to distinguish between the proper output from LLM and the output generated under malicious modification of the LLM. As this problem was not extensively researched, participants are required to develop new techniques to address this issue or adjust already existing ones to this problem's statement.
- North America > United States (0.16)
- Europe > Poland > Masovia Province > Warsaw (0.06)
- South America (0.04)
- (5 more...)
- Information Technology > Security & Privacy (1.00)
- Government (1.00)
- Education (0.93)
Recommender systems, stigmergy, and the tyranny of popularity
Dunivin, Zackary Okun, Smaldino, Paul E.
Scientific recommender systems, such as Google Scholar and Web of Science, are essential tools for discovery. Search algorithms that power work through stigmergy, a collective intelligence mechanism that surfaces useful paths through repeated engagement. While generally effective, this "rich-get-richer" dynamic results in a small number of high-profile papers that dominate visibility. This essay argues argue that these algorithm over-reliance on popularity fosters intellectual homogeneity and exacerbates structural inequities, stifling innovative and diverse perspectives critical for scientific progress. We propose an overhaul of search platforms to incorporate user-specific calibration, allowing researchers to manually adjust the weights of factors like popularity, recency, and relevance. We also advise platform developers on how text embeddings and LLMs could be implemented in ways that increase user autonomy. While our suggestions are particularly pertinent to aligning recommender systems with scientific values, these ideas are broadly applicable to information access systems in general. Designing platforms that increase user autonomy is an important step toward more robust and dynamic information
- North America > United States > California > Yolo County > Davis (0.04)
- North America > United States > California > Merced County > Merced (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
- Information Technology > Information Management > Search (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
Connecting Ideas in 'Lower-Resource' Scenarios: NLP for National Varieties, Creoles and Other Low-resource Scenarios
Joshi, Aditya, Kanojia, Diptesh, Lent, Heather, Kaing, Hour, Song, Haiyue
While each of the lower-resource scenarios bears its unique socio-historical contexts, the tutorial (Selected as a tutorial at COLING 2025) brings together researchers working separately in Despite excellent results on benchmarks these scenarios. Collectively, the tutorial will connect over a small subset of languages, large language past research in terms of: models struggle to process text from Challenges in data curation languages situated in'lower-resource' scenarios Potential for wide linguistic variation (e.g., existing such as dialects/sociolects (national on a linguistic continuum or eschewing or social varieties of a language), Creoles strict spelling conventions, etc.) (languages arising from linguistic contact Need for smart modeling choices over greedy between multiple languages) and other lowresource ones languages. This introductory Increased model vulnerability tutorial will identify common challenges, This introductory tutorial identifies the emergence approaches, and themes in natural language of'lower-resource' scenarios, specifically national processing (NLP) research for confronting varieties, Creoles and other low-resource languages, and overcoming the obstacles inherent and highlights commonalities and differences to data poor contexts.
- Europe > Denmark > North Jutland > Aalborg (0.05)
- Europe > Denmark > Capital Region > Copenhagen (0.05)
- Oceania > Australia > New South Wales > Sydney (0.04)
- (3 more...)
PaperQA: Retrieval-Augmented Generative Agent for Scientific Research
Lála, Jakub, O'Donoghue, Odhran, Shtedritski, Aleksandar, Cox, Sam, Rodriques, Samuel G., White, Andrew D.
Large Language Models (LLMs) generalize well across language tasks, but suffer from hallucinations and uninterpretability, making it difficult to assess their accuracy without ground-truth. Retrieval-Augmented Generation (RAG) models have been proposed to reduce hallucinations and provide provenance for how an answer was generated. Applying such models to the scientific literature may enable large-scale, systematic processing of scientific knowledge. We present PaperQA, a RAG agent for answering questions over the scientific literature. PaperQA is an agent that performs information retrieval across full-text scientific articles, assesses the relevance of sources and passages, and uses RAG to provide answers. Viewing this agent as a question answering model, we find it exceeds performance of existing LLMs and LLM agents on current science QA benchmarks. To push the field closer to how humans perform research on scientific literature, we also introduce LitQA, a more complex benchmark that requires retrieval and synthesis of information from full-text scientific papers across the literature. Finally, we demonstrate PaperQA's matches expert human researchers on LitQA.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- Asia > China > Hong Kong (0.04)
- (3 more...)
- Research Report (1.00)
- Overview (0.93)
Adversarial Nibbler: A Data-Centric Challenge for Improving the Safety of Text-to-Image Models
Parrish, Alicia, Kirk, Hannah Rose, Quaye, Jessica, Rastogi, Charvi, Bartolo, Max, Inel, Oana, Ciro, Juan, Mosquera, Rafael, Howard, Addison, Cukierski, Will, Sculley, D., Reddi, Vijay Janapa, Aroyo, Lora
The generative AI revolution in recent years has been spurred by an expansion in compute power and data quantity, which together enable extensive pre-training of powerful text-to-image (T2I) models. With their greater capabilities to generate realistic and creative content, these T2I models like DALL-E, MidJourney, Imagen or Stable Diffusion are reaching ever wider audiences. Any unsafe behaviors inherited from pretraining on uncurated internet-scraped datasets thus have the potential to cause wide-reaching harm, for example, through generated images which are violent, sexually explicit, or contain biased and derogatory stereotypes. Despite this risk of harm, we lack systematic and structured evaluation datasets to scrutinize model behavior, especially adversarial attacks that bypass existing safety filters. A typical bottleneck in safety evaluation is achieving a wide coverage of different types of challenging examples in the evaluation set, i.e., identifying 'unknown unknowns' or long-tail problems. To address this need, we introduce the Adversarial Nibbler challenge. The goal of this challenge is to crowdsource a diverse set of failure modes and reward challenge participants for successfully finding safety vulnerabilities in current state-of-the-art T2I models. Ultimately, we aim to provide greater awareness of these issues and assist developers in improving the future safety and reliability of generative AI models. Adversarial Nibbler is a data-centric challenge, part of the DataPerf challenge suite, organized and supported by Kaggle and MLCommons.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- Europe > Switzerland > Zürich > Zürich (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- (6 more...)
- Health & Medicine (0.68)
- Education (0.68)
- Information Technology > Security & Privacy (0.48)
- Government > Military (0.34)
BUSTED: How this professor is flushing out students who use ChatGPT
A geography professor shared his method to detect AI-generated plagiarism with Fox News. He developed it after noticing that ChatGPT produced fake citations. A college professor said he found an easy way to catch AI-generated plagiarism after finding phony citations in some of ChatGPT's content. "It's very easy to identify the fake references," said Terence Day, a physical geography professor at Okanagan College in British Columbia. "All you need to do, really, is to check them up on the internet."
ChatGPT cites the most-cited articles and journals, relying solely on Google Scholar's citation counts. As a result, AI may amplify the Matthew Effect in environmental science
ChatGPT (GPT) has become one of the most talked-about innovations in recent years, with over 100 million users worldwide. However, there is still limited knowledge about the sources of information GPT utilizes. As a result, we carried out a study focusing on the sources of information within the field of environmental science. In our study, we asked GPT to identify the ten most significant subdisciplines within the field of environmental science. We then asked it to compose a scientific review article on each subdiscipline, including 25 references. We proceeded to analyze these references, focusing on factors such as the number of citations, publication date, and the journal in which the work was published. Our findings indicate that GPT tends to cite highly-cited publications in environmental science, with a median citation count of 1184.5. It also exhibits a preference for older publications, with a median publication year of 2010, and predominantly refers to well-respected journals in the field, with Nature being the most cited journal by GPT. Interestingly, our findings suggest that GPT seems to exclusively rely on citation count data from Google Scholar for the works it cites, rather than utilizing citation information from other scientific databases such as Web of Science or Scopus. In conclusion, our study suggests that Google Scholar citations play a significant role as a predictor for mentioning a study in GPT-generated content. This finding reinforces the dominance of Google Scholar among scientific databases and perpetuates the Matthew Effect in science, where the rich get richer in terms of citations. With many scholars already utilizing GPT for literature review purposes, we can anticipate further disparities and an expanding gap between lesser-cited and highly-cited publications.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > New York (0.05)
- Europe > Czechia > Prague (0.04)
Machine learning classifies catalytic-reaction mechanisms
Danilo M. Lustosa is in the Department of Chemistry, Ben-Gurion University of the Negev, Be'er Sheva 84105, Israel. Anat Milo is in the Department of Chemistry, Ben-Gurion University of the Negev, Be'er Sheva 84105, Israel. The discovery of chemical reactions is influenced not only by how fast experimental data can be acquired, but also by how easily chemists can make sense of these data. Unravelling the mechanistic underpinnings of new catalytic reactions is a particularly intricate problem, often requiring expert knowledge of computational and physical organic chemistry. Nevertheless, it is important to study catalytic reactions because they represent the most efficient chemical processes.
- Asia > Middle East > Israel (0.48)
- Europe > United Kingdom > England > Greater London > London (0.08)
- Europe > Germany (0.08)
The History of AI Rights Research
This report documents the history of research on AI rights and other moral consideration of artificial entities. It highlights key intellectual influences on this literature as well as research and academic discussion addressing the topic more directly. We find that researchers addressing AI rights have often seemed to be unaware of the work of colleagues whose interests overlap with their own. Academic interest in this topic has grown substantially in recent years; this reflects wider trends in academic research, but it seems that certain influential publications, the gradual, accumulating ubiquity of AI and robotic technology, and relevant news events may all have encouraged increased academic interest in this specific topic. We suggest four levers that, if pulled on in the future, might increase interest further: the adoption of publication strategies similar to those of the most successful previous contributors; increased engagement with adjacent academic fields and debates; the creation of specialized journals, conferences, and research institutions; and more exploration of legal rights for artificial entities.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.28)
- North America > Canada > Quebec > Montreal (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- (43 more...)
- Research Report (1.00)
- Overview (1.00)
- Personal > Interview (0.45)
- Media (1.00)
- Law (1.00)
- Information Technology (1.00)
- (4 more...)